DiffMotion: Speech-Driven Gesture Synthesis Using Denoising Diffusion Model
نویسندگان
چکیده
Speech-driven gesture synthesis is a field of growing interest in virtual human creation. However, critical challenge the inherent intricate one-to-many mapping between speech and gestures. Previous studies have explored achieved significant progress with generative models. Notwithstanding, most synthetic gestures are still vastly less natural. This paper presents DiffMotion, novel speech-driven architecture based on diffusion The model comprises an autoregressive temporal encoder denoising probability Module. extracts context input historical module learns parameterized Markov chain to gradually convert simple distribution into complex generates according accompanied speech. Compared baselines, objective subjective evaluations confirm that our approach can produce natural diverse gesticulation demonstrate benefits diffusion-based models synthesis. Project page: https://github.com/zf223669/DiffMotion.
منابع مشابه
Data-driven Speech Denoising Using Noise Profiles
This paper describes a targeted, undemanding data-driven signal processing approach to identify, control, and suppress a specific background noise which is present in a recording together with a spoken utterance. A background noise (like e.g. the sound of an engine onboard a bus) negatively influences the ASR system performance by distorting the speech signal spectrum. Thus it is necessary to p...
متن کاملCombined Gesture-Speech Recognition and Synthesis Using Neural Networks
Sign languages such as Spanish Sign Language (LSE) are the primary communication way among members of the Deaf community. However this language is not widely known outside of this community. The techniques for automatic recognizing hand signs proposed in this paper allow creating systems which can help deaf people to communicate with others, by providing them with computer tools for assisted co...
متن کاملTowards Natural Gesture Synthesis: Evaluating Gesture Units in a Data-Driven Approach to Gesture Synthesis
Virtual humans still lack naturalness in their nonverbal behaviour. We present a data-driven solution that moves towards a more natural synthesis of hand and arm gestures by recreating gestural behaviour in the style of a human performer. Our algorithm exploits the concept of gesture units to make the produced gestures a continuous flow of movement. We empirically validated the use of gesture u...
متن کاملCombined Gesture-Speech Analysis and Synthesis
Multi-modal speech and speaker modelling and recognition are widely accepted as vital aspects of state of the art human-machine interaction systems. While correlations between speech and lip motion as well as speech and facial expressions are widely studied, relatively little work has been done to investigate the correlations between speech and gesture. Detection and modelling of head, hand and...
متن کاملData Driven Gesture Model Acquisition using Minimum Description Length
An approach is presented to automatically segment and label a continuous observation sequence of hand gestures for a complete unsupervised model acquisition. The method is based on the assumption that gestures can be viewed as repetitive sequences of atomic components, similar to phonemes in speech, starting and ending in a rest position and governed by a high level structure controlling the te...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Lecture Notes in Computer Science
سال: 2023
ISSN: ['1611-3349', '0302-9743']
DOI: https://doi.org/10.1007/978-3-031-27077-2_18